“Simon Fraser University– Tableau”

VAST 2010 Challenge
Hospitalization Records -  Characterization of Pandemic Spread

Authors and Affiliations:

Minoo Erfani Joorabchi, Simon Fraser University, mea18@sfu.ca [PRIMARY contact]
Mona Erfani Joorabchi, Simon Fraser University, mea16@sfu.ca
Dr. Christopher D. Shaw, Simon Fraser University, shaw@sfu.ca
[Faculty advisor]

Tool(s):

Version 5.1 of Tableau was used for solving Mini Challenge 2. Tableau data visualization software was a project in Stanford University in 1997 by Chris Stolte and Professor Pat Hanrahan. They invented a database visualization language called VizQL (Visual Query Language) by combining a structured query language for databases with a descriptive language for rendering graphics. Tableau Desktop was released in 2003. With Tableau, analysts can connect to large amounts of data and auto-generates visualizations including reports, tables, charts and graphs through a drag-and-drop interface. 

 

Microsoft Excel was used for combining the measured attributes of all eleven cities/countries. Again the excel file was inserted to Tableau for further analysing and comparing cities/countries.

 

Video:

 

MC2 Video (.mp4)

 

 

ANSWERS:


MC2.1: Analyze the records you have been given to characterize the spread of the disease.  You should take into consideration symptoms of the disease, mortality rates, temporal patterns of the onset, peak and recovery of the disease.  Health officials hope that whatever tools are developed to analyze this data might be available for the next epidemic outbreak.  They are looking for visualization tools that will save them analysis time so they can react quickly.

 

Approach: We loaded two files of each eleven cities/countries, to Tableau. The provided files were in the form of .csv with varied sizes from 90K records to 7 million records. Using Tableau for MC2 was efficient regarding the process of analyzing the data as well as the work needed to be done to enter the data, in particular for the huge datasets like Karachi and Aleppo. We used the following approach for each city/country e.g., for Colombia there were Colombia.csv and Colombia-deaths.csv with attribute ID in both tables as primary key. As mentioned in the beginning of the video we Left joined the two files (Colombia.csv on the left side and Colombia-deaths.csv on the right) in order to save all the attributes of patients and deaths. Therefore we had a joint table in Tableau for each city/country with ID, Age, Gender, Symptoms, Admittance date and Death date (Death date had valid value for those patients who died and null value otherwise). When we connected Tableau to the text file (because .csv file is read as text file in Tableau), it created the joint table quickly depending on file size e.g., for huge files it took up to 4 minutes to connect to the dataset. After the data import stage, we started analyzing the table by dragging and dropping attributes. Similarly, the processing time for huge datasets took sometimes up to 4 minutes.

 

 

Findings:

 

A set of 92 symptoms were identified in the outbreak. Although not all the patients with these 92 symptoms died, however all of these 92 symptoms caused large number deaths after exactly 8 days (Table1). Since the hospitalization time for patients died in the outbreak was exactly 8 days, we deduced that if a patient in the outbreak could survive more than 8 days he/she then would recover.

The mortality rates of cities/countries which were involved in the outbreak was high, Mean = .097, in comparison to the mortality rate while excluding the patients died in the outbreak, Mean = .0001. Additionally, having a low standard deviation, SD = .0008, indicated that the mortality rates of cities/countries which were involved in the outbreak tended to be very close to the mean (Figure7, Table2).

Patients in the outbreak had hospital admittance date started April 16th until June 29th with a peak in May 13th-21st. Similarly, those who died in the outbreak had hospital admittance date started April 16th-22nd until Jun 18th-26th with a peak in May 14th-20th. Their death date started Apr 24th-May 4th until Jun 24th-30th with a peak in May 22nd-28th in different cities/countries (Figure2, Table3).

Gender and age were not significant factors as they distributed normally in all cities/countries for both patients and deaths (normal curve).

 

 

Further investigation:

 

There were 1,308-1,310 symptoms in each city/country. Figure1 showed number of patients and deaths for all symptoms in Colombia. All the cities/countries which were involved in the outbreak followed the same pattern. There was a set of 92 symptoms had the most number of patients and deaths during May. Then for patients there was a jump to a set of 1,173/1,174 symptoms causing a few numbers of patients (Figure1.B, red arc). Finally there was another jump for the rest of 44/43 symptoms for a very few patients. Considering Figure1.A, after those 92 symptoms there was a jump to the rest of symptoms with a very few number of deaths.

 

Dashboard 2

A.                                        B.

Figure1. Colombia. A) Symptoms which caused death. B) All the symptoms.

 

At first glance, ‘NOSE BLEEDS' had the highest number of people infected which was almost double the rest of 91 symptoms (Figure1, red arrows), however further investigation assisted in grouping them into the following 39 distinct Types of symptoms. Since the numbers of patients (or deaths) caused by these symptoms were quite in the same range (Figure1), we believed in this outbreak ‘Type 21’ was the most common symptom, then ‘Type 3’, 5, 33, 31, 28, etc in (Table1, gray cells). Therefore vomiting, diarrhea, abdominal pain, back pain, fever and nose bleed were among very common symptoms.

 

Table1. Categorizing top 92 symptoms into 39 Types.

Type

92 Symptoms

 1

ABD AND BACK PAIN

ABDBACK PAIN

2

ABD CRAMPING

ABDOMINAL CRAMPING

3

ABD PAIN

ABD PAINABD PAIN

ABD PAINS

ABD PN

ABD. PAIN

ABD.PAIN

ABDOMEN PAIN

ABDOMINAL PAIN

ABDOMINAL PAINABDOMINAL PAIN

4

ABD PAIN FEVER

ABD PAIN, FEVER

ABD PAINFEVER

5

ABD PAIN VOMITING

ABD PAIN, VOMITING

ABD PAIN,VOMITING

ABD PAINVOMITING

ABDOMINAL PAIN VOMITING

ABDOMINAL PAIN, VOMITING

VOMITING ABD PAIN

VOMITING, ABD PAIN

VOMITINGABD PAIN

6

ABD PAIN, NAUSEA

7

ABD PX

8

ABDMNAL PAIN OTH SPCF ST

ABDMNAL PAIN UNSPCF SITE

9

ABDOMINAL PAIN ACUTE

10

ABNORMAL LABS

11

BACK AND LEG PAIN

12

BACK AND NECK PAIN

13

BACK INJ

BACK INJURY

14

BACK PAIN

BACK PAINBACK PAIN

BACK PAINS

BACK PN

15

BACK PAIN STRAIN

BACK PAINSTRAIN

16

BACK PX

17

BACK SPASMS

18

BACK STRAIN

19

CONJUNCTIVITIS RED

20

DIARRHEA

 

 

Type

92 Symptoms

21

DIARRHEA AND VOMITING

DIARRHEA VOMITING

DIARRHEA, VOMITING

DIARRHEA,VOMITING

VOMITING   DIARRHEA

VOMITING  DIARRHEA

VOMITING & DIARRHEA

VOMITING AND DIARRHEA

VOMITING DIARRHEA

VOMITING, DIARRHEA

VOMITING,DIARRHEA

VOMITINGDIARRHEA

22

DIARRHEA FEVER

DIARRHEA, FEVER

23

ENCEPHALITIS

24

FACIAL SWELLING

25

HEAD-FACE-NECK SWELLING

26

HEARING LOSS

27

NOSE

28

NOSE BLEED

NOSE BLEEDNOSE BLEED

NOSE BLEEDS (almost twice as others)

29

PROTEINURIA

30

TREMORS

31

VOMITING

VOMITING ALL DAY

VOMITING ALONE

VOMITTING

VOMTING

32

VOMITING  BLURRED VISION

33

VOMITING  FEVER

VOMITING AND FEVER

VOMITING FEVER

VOMITING, FEVER

VOMITING,FEVER

VOMITINGFEVER

34

VOMITING BLOOD

VOMITING BLOODVOMITING BLOOD

VOMITTING BLOOD

35

VOMITING DIARRHEA FEVER

VOMITING, DIARRHEA, FEVER

VOMITING,DIARRHEA,FEVER

36

VOMITING RASH

37

VOMITING, NECK PAIN, HEADACHE, BLURRED VISION

38

VOMITING,STOMACH HURTS

39

VOMITINGHEADACHE

 

 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Following the lead that these 92 symptoms were likely to be the symptoms in the outbreak, we filtered the data to visualize only the patients in the outbreak. One could recognize a peak in May on the hospital admittance date of patients and deaths in the outbreak (Figure2.B.Bottom) which slightly affected the hospital admittance date of all patients (Figure2.B.Top). On the other hand all deaths (Figure2.A.Top) were influenced hugely by the deaths in the outbreak (Figure2.A.Bottom). The peak for deaths of the outbreak was late May.

 

Hospital admittance of deaths in outbreak; they died in mid May and early June

 

Hospital admittance of patients in outbreak

 

Dashboard 2

 

Jump Time

 

Peak Time

 

Peak Time

 

Jump Time

 
Dashboard 2

                                                              A.                                                                                                                            B.

Figure2. Colombia. A.Top) Deaths records A.Bottom) Deaths of the outbreak (92 symptoms) B.Top) Hospital admittance of patients (blue) and hospital admittance of deaths (orange). B.Bottom) Hospital admittance of patients of the outbreak (blue) and hospital admittance of deaths of the outbreak (orange, green and red).

 

Regarding recovery of the disease, we found out all the deaths of the outbreak died exactly after 8 days. Figure3.A displayed the correlation between hospital admittance dates versus death dates of whole deaths in Colombia; it also compared deaths in the outbreak (Figure3.B) with deaths died for other reasons than the outbreak (Figure3.C). We noticed the huge mass disappeared in Figure3.C. 

 

Turkey-All

 

Dashboard 2

 

 
                                         

 

                                                                                                        A.                                                                                             B.                                                                                           C.

Figure3. Deaths in Colombia. A) Hospital admittance date vs. death date. B) Hospital admittance date for those died in the outbreak vs. death date. C) Hospital admittance date for those who not in the outbreak vs. death date.

 

To ensure that the symptoms of the outbreak are only limited to these 92 symptoms, we checked a hypothesis that these top 92 symptoms could be severe ones, and the next jump to 1,173/1,174 symptoms in Figure1 could be the moderate ones which didn’t cause major deaths. However, investigating the rest of symptoms showed that there was no peak or temporal patterns on hospital admittance for those (Figure4).

 

Dashboard 2

 
other symptoms than 92

Figure4. Patients in Colombia excluding ones with 92 symptoms. Top) Hospital admittance. Bottom) Deaths date.

 


MC2.2:  Compare the outbreak across cities.  Factors to consider include timing of outbreaks, numbers of people infected and recovery ability of the individual cities.  Identify any anomalies you found.

 

Findings:

 

In an attempt to compare the outbreak across cities, we divided cities into two groups:

·         First group was the 9 cities/counties of Aleppo, Colombia, Iran, Karachi, Lebanon, Nairobi, Saudi Arabia, Venezuela, and Yemen which behaved similarly, with a large number of deaths.

·         Second group was two counties of Thailand and Turkey, with a little number of deaths.

 

Since all the 9 cities/countries in the first group were similar for different factors, namely patterns of the symptoms in the outbreak, mortality rates, hospitalization time for deaths, timing of outbreaks, etc., we believed the cities/countries in first group were mainly affected by the pandemic and countries in the second group, Thailand and Turkey, were unaffected by the disease.

 

Further investigation on the diagrams of date of deaths and hospital admittance of patients and deaths in the 9 cities/countries of first group, we identified that Aleppo, Yemen, Karachi, Colombia and Nairobi were quite identical (had an exact normal curve) while Lebanon, Venezuela, Iran and Saudi Arabia were also identical (they had still normal curve with a small curve, belly at the beginning of the normal curve) as it shown in the video. The outbreak started Apr 16 until  Jun 29 for the countries in the first groups. Comparing the timing of outbreaks, we identified the peak of the disease was a bit varied starting from May 13 to May 21 in different cities/countries and it was roughly in the following order: Kenya (Nairobi), Syria (Aleppo), Lebanon, Yemen, Pakistan (Karachi), Saudi Arabia, Venezuela, Colombia and Iran.

Additionally the deaths date in the outbreak started around [Apr 24-May4] until [Jun 24-30] in the 9 countries. Also there was a jump in their diagrams which was shown the main start for the deaths in the outbreak (Figure2.A.Bottom). Both jump time and peak time for deaths date in the outbreak followed the same order as highlighted above (Figure7, Figure8). Thus we concluded that Kenya (Nairobi), Syria (Aleppo), and Lebanon first got the disease, had the first peak for patients and then deaths. Then Yemen, Pakistan (Karachi), and Saudi Arabia were in the middle and finally Venezuela, Colombia and Iran were the last ones.

 

 

Further investigation on Turkey and Thailand:

 

Regarding the symptoms, all the 9 countries in the first group followed the same pattern for symptoms as shown in Figure1. However number of patients and deaths of different symptoms in Thailand and Turkey were hugely different (Figure5). There were a couple of symptoms with most occurrences, 7 symptoms in Turkey and 6 symptoms in Thailand. Then there was a jump to rest of symptoms causing little number of patients (Figure5.B).

 

7 common symptoms in Turkey were 'nose bleeds', 'headache, blurred vision', 'LT hand injury', ' tonsil bleed', 'elevated temp', 'LT arm pain', and 'rash on hands'. Only ‘headache, blurred vision' caused to death. Similarly in Thailand 6 symptoms of 'LT hand injury', ' tonsil bleed', 'elevated temp', 'headache, blurred vision', ' nose bleeds' and 'broken tooth' were the highest symptoms for the patients which none of them caused to death. Although headache, blurred vision, and nose bleeds were in common with the 92 symptoms in the outbreak but not the same specific pattern in the outbreak were identified in the patients and deaths for Thailand and Turkey (Figure6).

 

Dashboard 2

 

Symptoms

 
Turkey-All               

                             A.                                                        B.

Figure5- Turkey. A) Symptoms which caused death. B) All the symptoms.

 

Moreover the mortality rates in Thailand and Turkey were .0009 which was similar to the mean mortality rates in other countries while excluding deaths in the outbreak (Mean = .0001). It was also hugely different from the mean mortality rate in the outbreak which was .097 (Table2).

 

Additionally the hospitalization time for deaths in Turkey and Thailand were 4.9 which was similar to hospitalization time for deaths in the rest of the countries while excluding deaths in the outbreak (Mean = 5.17). it was also very different from the hospitalization time for deaths in the outbreak which was exactly 8 days (Table2).

 

Turkey-All

 

Dashboard 2

 
Turkey-All2 Turkey-All

                                                            A.                                                                                            B.

Figure6- Turkey. A.Top) Hospital admittance of patients (blue) and hospital admittance of deaths (orange). A.Bottom) Hospital admittance of patients (blue) and hospital admittance of deaths (orange). B) Hospital admittance of deaths vs. their date of death.

 

 

Further investigation on the outbreak across cities/countries:

 

The number of patients and deaths in the outbreak for countries may not be as useful as mortality rate because we did not have the whole number of population, Figure7 compared number of patients and deaths in the outbreak color coded by their admittance date. Figure7 also compared the mortality rates of outbreak color coded by their peak of death time for all the cities/countries. In terms of mortality rate, we noticed that Syria (Aleppo) and Kenya (Nairobi) had the highest mortality rate while Saudi Arabia and Lebanon had the lowest. It also displayed the peak time was started earliest in Syria (Aleppo) and Kenya (Nairobi) while latest in Colombia and Iran (Table3). 

 

Dashboard 1

 

Dashboard 1

 

Dashboard 1

 
Dashboard 1

Figure7. Numbers of patients of outbreak, numbers of deaths of outbreak and mortality rates of outbreak.

 

Although Figure8 compared the jump time and peak time of deaths date as well as peak time of hospital admittance date for patients and deaths in the outbreak, they followed almost the same order as mentioned above: Kenya (Nairobi), Syria (Aleppo), Lebanon, Yemen, Pakistan (Karachi), Saudi Arabia, Venezuela, Colombia and Iran.

 

Dashboard 9

 

Dashboard 9

 

Dashboard 9

 

Dashboard 9

 
Dashboard 9

Figure8. Day of Jump time and Peak time for deaths of outbreak.

 

Table2 and Table3 displayed more facts and figures of different attribute in the cities/countries.

 

Table2. Hospitalization time for deaths of outbreak and non-outbreak, Mortality Rate of outbreak and non-outbreak and No. of Symptoms of deaths across cities/countries.

Country

Mean Hospitalization Time for deaths

No. of Symptoms

of deaths

Outbreak

Non-outbreak

Outbreak

Non-outbreak

Mortality Rate

#Patients

#Deaths

Mortality Rate

#Patients

#Deaths

Nairobi

8

5.412

679

.1080

398,406

43,065

.00100

892,653

894

Aleppo

8

5.378

942

.1090

705,856

76,993

.00109

1,536,792

1,679

Lebanon

8

5.161

397

.0865

84,367

7,304

.00095

356,822

342

Yemen

8

5.371

298

.0999

74,842

7,479

.00102

226,945

232

Karachi

8

5.195

1,280

.0967

1,651,586

159,866

.00104

5,503,338

5,739

Saudi Arabia

8

5.042

809

.0844

242,130

20,439

.00100

1,085,432

1,090

Venezuela

8

5.079

187

.1002

36,060

3,616

.00093

107,746

101

Colombia

8

5.160

544

.0968

162,698

15,751

.00106

542,583

580

Iran

8

5.209

449

.0943

121,812

11,496

.00102

421,436

430

Thailand

-

4.919

85

-

-

-

.00097

88,457

86

Turkey

-

4.961

255

-

-

-

.00092

303,987

280

Average

8

5.171

-

.0973

-

-

.0001

--

-

 

Table3. Date of deaths of outbreak and hospitalization time for both patients and deaths of outbreak.

Country

Outbreak- Date of deaths

Outbreak- Hospital admittance date

Patient

Deaths

Start Time

Jump Time

Peak Time

End Time

Start Time

Peak Time

End Time

Start Time

Peak Time

End Time

Nairobi

Apr 29

May 4

May 22

Jun 24

Apr 16

May 14

Jun 29

Apr 17

May 14

Jun 18

Aleppo

May 1

May 5

May 23

Jun 25

Apr 16

May 13

Jun 29

Apr 20

May 15

Jun 18

Lebanon

Apr 24

May 6

May 25

Jun 30

Apr 16

May 16

Jun 29

Apr 20

May 17

Jun 18

Yemen

May 1

May 6

May 24

Jun 27

Apr 16

May 16

Jun 29

Apr 19

May 16

Jun 21

Karachi

May 2

May 6

May 24

Jun 26

Apr 16

May 17

Jun 29

Apr 17

May 17

Jun 25

Saudi Arabia

May 3

May 7

May 26

Jun 28

Apr 16

May 19

Jun 29

Apr 20

May 18

Jun

Venezuela

Apr 25

May 8

May 27

Jun 28

Apr 16

May 19

Jun 29

Apr 21

May 19

Jun 20

Colombia

May 4

May 9

May 28

Jun 29

Apr 16

May 20

Jun 29

Apr 22

May 20

Jun 23

Iran

May 2 

May 8

May 27

Jun 29

Apr 16

May 21

Jun 29

Apr 24

May 19

Jun 25

Thailand

-

-

-

-

-

-

-

-

-

-

Turkey

-

-

-

-

-

-

-

-

-

-

Average

 

 

 

 

Apr 16

 

Jun 29